Intersect (-c) mRNA, mCpG
Intersect (-c) mRNA, NONmCpG

Intersect (-c) CDS, mCpG
Intersect (-c) CDS, NONCpG

Join [SQLshare] (subtract to get intron)

example
SELECT FROM [sr320@washington.edu].[fish546_module1_blast_table
  INNER JOIN [dhalperi@washington.edu].[gp_association.goa_uniprot]
  ON [sr320@washington.edu].[fish546_module1_blast_table].SPID=[dhalperi@washington.edu].[gp_association.goa_uniprot ].Column2

Various code
Select Count (mCpGcountFROM [sr320@washington.edu].[fish546TJGR_CDS_int_mCpG_2]
#counts - 196691

Select sum (mCpGcountFROM [sr320@washington.edu].[fish546TJGR_CDS_int_mCpG_2]
#sum 246609

Select IDSUM(mCpGcountFROM [sr320@washington.edu].[fish546TJGR_CDS_int_mCpG_2]
  Group by ID
#AWESOME

Select IDavg(mCpGcount),min(mCpGcount),max(mCpGcount),sum(mCpGcount),count(mCpGcountFROM [sr320@washington.edu].[fish546TJGR_CDS_int_mCpG_2]
  Group by ID



Select From [sr320@washington.edu].[fish546TJGR_mRNA_int_mCpG_2
  Inner join [sr320@washington.edu].[Stats_CDS_int_mCpG]
  ON [sr320@washington.edu].[fish546TJGR_mRNA_int_mCpG_2].ID=[sr320@washington.edu].[Stats_CDS_int_mCpG].ID
#Downloaded as Methylated CpG dataset


Select From [sr320@washington.edu].[fish546TJGR_mRNA_int_NOmCpG_2]   Inner join [sr320@washington.edu].[Stats_CDS_int_NOmCpG]
  ON [sr320@washington.edu].[fish546TJGR_mRNA_int_NOmCpG_2].ID=[sr320@washington.edu].[Stats_CDS_int_NOmCpG].ID
#Downloaded as NO methylated dataset



Select IDavg(NOmCpGcount),min(NOmCpGcount),max(NOmCpGcount),sum(NOmCpGcount),count(NOmCpGcountFROM [sr320@washington.edu].[fish546TJGR_CDS_int_NOmCpG_2]  
  Group by ID
Stats_CDS_int_NOmCpG


INTO excel (oh no!)



4042 genes have no Bisulfite Data.

Got intron data- now back in SQLshare

some codes
SELECT FROM [sr320@washington.edu].[AggCo Oyster Bisulfite mRNA and CDS
  Where "SUM mRNA" ]]>   100 
  and "Ratio mCDS/mIntron" ]]>   3


Lets get some gene names



SELECT FROM [sr320@washington.edu].[BSoysterGENE]   
  Where "SUM mRNA" ]]>   100 
  and "Ratio mCDS/mIntron" ]]>   3


SELECT FROM [sr320@washington.edu].[BSoysterGENE]     
  Where "SUM mRNA" ]] ]]>  100 
  and "Percent mCpG (CDS)" ]] ]]>  75
  and "Percent mCpG (Intron)" 25




Joining with expression data












Histogram

#read in table
data<-read.csv("/Users/sr320/Desktop/TJGRR/AggCo.csv")

#view 
head(data)

library(ggplot2)

qplot(data$Percent.mCpG..CDS., binwidth=1)


qplot(data$Ratio.mCDS.mIntron, binwidth=.1)


ggplot(data, aes(x=mCpGcount)) + geom_histogram(binwidth=.5) + scale_x_continuous(limits = c(0, 300))

ggplot(data, aes(x=mCpGcount)) + geom_histogram(binwidth=.5)

ggplot(data, aes(x=CDScount, y=Ratio.mCDS.mIntron)) + geom_point(shape=1)







Intersect (-c) Expression based on mRNA (need to create)

On iPlant have BAM (accepted hits) one Mgo on genome.
Need to get (split) coverage across mRNA

or 

Should be able to just get RPKM 



--------------------
Take clusters and ID using Blast



---------------- 
Clusters closest

done



--------------------
Study those mCpG that are same between Gill and Sperm